Extracting Citation Metadata from Online Publication Lists Using BLAST
نویسندگان
چکیده
Scientific research reports require a great deal of citation, therefore an automatic citation tool would be of great use. Due to numerous models, it is difficult to automatically transform semi-structured citation data into structured citations. Some digital library institutes, like ResearchIndex (CiteSeer) or OpCit, have attempted automatic citation parsing. In order to recognize citation metadata, we use gene sequence alignment tool to recognize citation data in our method. Known semi-structured citation data are transformed into protein sequences, and saved in a template database. To parse new semi-structured citation data, we can translate it into a protein sequence. We then use BLAST (Basic Local Alignment Search Tool), a sequence alignment tool, to find the most similar template to the protein sequence from the template database previously constructed. We can then parse metadata according to the template. Using the 2,500 templates generated by our template generating system as the template database and parsing all of the 2,500 citation using our parsing system, we obtain 89% precision rate. Using the same template database, ParaTools obtains 79% precision rate. ParaTools contains about 400 templates in the system. Using the default template database, ParaTools only obtains 30% precision rate.
منابع مشابه
تأخیر در انتشار مجلههای علمی: مطالعه نشریات مصوب وزارت علوم، تحقیقات و فناوری ایران
: Publication delay is a negative phenomenon in scientific information dissemination. The current research studies the publication delay of scientific journals accredited by the Ministry of Science, Research & Technology of Iran. It also investigates the association between journals’ characteristics and their publication lag. This study employs the applied research method. All 1156 journals of ...
متن کاملThe online attention to certain nuclear medicine topics: An altmetrics study vs. a citation analysis
Introduction: Traditional citation analysis has been greatly criticized because the process of citation accumulation requires considerable time after publication. So, the term “altmetrics” was proposed in 2010 to measure the scientific and social impact of a paper.We performed a search for certain nuclear medicine topics using the altmetrics approach to report the correlation b...
متن کاملNew Methods for Metadata Extraction from Scientific Literature
Spreading the ideas and announcing new discoveries and findings in the scientific world is typically realized by publishing and reading scientific literature. Within the past few decades we have witnessed digital revolution, which moved scholarly communication to electronic media and also resulted in a substantial increase in its volume. Nowadays keeping track with the latest scientific achieve...
متن کاملExtracting Procedural Models Using Educational Data Mining
1 ISSN 1436-4522 (online) and 1176-3647 (print). © International Forum of Educational Technology & Society (IFETS). The authors and the forum jointly retain the copyright of the articles. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage an...
متن کامل‘Ocean biodiversity informatics’: a new era in marine biology research and management
Ocean biodiversity informatics (OBI) is the use of computer technologies to manage marine biodiversity information, including data capture, storage, search, retrieval, visualisation, mapping, modelling, analysis and publication. The latest information systems are open-access, making data and/or information publicly available over the Internet. This ranges from primary data on species occurrence...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004